Skip to main content

Transaction and Concurrency


In this note it explained in more detail some topics about transaction and concurrency, such as deadlock handling and concurrency control schemes.


Deadlock Handling

In this chapter we will approach some mechanisms databases implement to prevent, mitigate and handle deadlocks.

Deadlock Prevention

A deadlock prevention strategy is a way that database try to avoid deadlocks happening (this is therefore done before a deadlock happens). The five deadlock prevention strategies that exist are:

  • Pre-declaration: Require a transaction to acquire all the needed locks for that transaction before the transaction ends.

  • Graph-based protocol: Impose a partial order on all data items, and force transactions to lock items based in the order imposed

  • Wait-Die Scheme: When two transactions are deadlock, the system can force the "younger transaction" (higher timestamp) to be rolled back (dies), so the oldest transaction can eventually get access to the data locked that was causing the deadlock. When transactions are rolled back, they are restarted with their original timestamp (to give preference to older transaction)

  • Wait-Wound Scheme: Similar to the strategy above, but instead fully rolling back the younger transaction, it is rolled back a limited number of instructions (not a restart)

  • Timeout-based Schemes: Database sets a timeout whenever a transaction is waiting for a deadlock; when that timeout expires, that transaction is rolled back. This strategy has some problems, namely unnecessary rollbacks can happen, and starvation can also happen.

Concurrency Control Schemes

In this section we will discuss and present some of the possible concurrency control schemes that database systems can implement, such as:

Multiple Granularity

(todo)

Multiversion Schemes

Multiversion schemes work on the idea of keeping different (old) versions of each data item, to be able to increase concurrency. This type of concurrency control can be described with three key ideas:

  • Whenever a write(Q) is issued, that creates a new version of that data item
  • Usage of timestamps to label versions
  • Upon a read(Q) performed by a transaction ti\Large t_i, the value that should be returned by this function depends on the timestamp of ti\Large t_i; an appropriate version must be chosen

In this chapter, we will explore two different variants of multiversion schemes:

Multiversion Timestamp Ordering

In multiversion timestamp ordering, we can establish the following principles:

  • Each data item Q\large Q has its own set of versions: <Q1,Q2,...,Qm>\large <Q_1, Q_2, ..., Q_m>
  • Each version QK\large Q_K has the following two timestamps:
    • W-Timestamp - timestamp of the transaction that created version QkQ_k
    • R-Timestamp - timestamp of the transaction withhighesttimestampthatasreadwith highest timestamp that as read Q_k$
Algorithm
Assumption

In the algorithm, a transaction ti\large t_i can issue two operations: read(Q) and write(Q). Let Qk\Large \textbf{Q}_{k} denote the version with the largest W-timestamp, such that W-timestamp \textbf{≤} TS(Ti\large T_i)

  1. If a transaction issues a read(Q), then:
    1. the value returned is Qk\Large \textbf{Q}_{k}
    2. if R-Timestamp(Qk\textbf{Q}_{k}) < TS(TiT_i), then set R-Timestamp(Qk\textbf{Q}_{k}) = TS(TiT_i)
  2. If transaction issues a write(Q), then:
    1. If R-Timestamp(Qk\textbf{Q}_\textbf{k}) > TS(TiT_i), we rollback the transaction with new timestamp
    2. If W-timestamp(QkQ_k) = TS(TiT_i), then version QkQ_k is overwritten
    3. Otherwise, a new version is created, and W-timestamp(QiQ_i) = R-Timestamp(Qk\textbf{Q}_\textbf{k})= TS(TiT_i)

We can easily see that, using this algorithm, read requests never fail.

Guarantees
  • This protocol guarantees serializability
  • However, it does not guarantee recoverability or cascadelessness

Snapshot Isolation

In snapshot isolation, the idea is to give each transaction, when they are created, a snapshot of the entire database; updates and reads of the transaction will be done locally, in that snapshot; it is at commit time that it is evaluated whether it is possible to update the database values, or if the transaction needs to be rolled back. Just like in multiversion timestamp ordering, read requests never fail (and also never need to wait for reads).

Multiversioning in Snapshot Isolation

In snapshot isolation, both transactions and data items have timestamp:

  • Transactions have two types of timestamps:
    • StartTS(ti\large t_i) - represents the timestamp at the time the transaction entered the system
    • CommitTS(tit_i) - represents the timestamp at the time the transaction commits
  • Data items only have one type of timestamp:
    • W-timestamp(QkQ_k) - it is equal to the Commit(tit_i) that created version Qk\large Q_k

When a transaction TjT_j reads a data item QQ, it will read the most recent version QkQ_k such that W-timestamp(QkQ_k) \le Start(TiT_i), and it will not see any updates with timestamp higher than StartTS(tit_i).

Validation

During transaction execution, no conflicts arise, as transactions work on their own snapshot; conflicts only surge at commit time, and it is necessary to have a validation policy, to know which transactions to commit, and which to roll back.

To define this validation policy, we first have to define what concurrent transactions are:

  • Two transactions TiT_i and TjT_j are said to be concurrent if:
    • StartTS(Ti)T_i) \le StartTS(Tj)T_j) \le CommitTS(TiT_i), or
    • StartTS(Tj)T_j) \le StartTS(Ti)T_i) \le _CommitTS(TjT_j)

If these two transactions update the same data item, both will be operating on their own private snapshot, and when changes are committed, one of the updates will be overwritten by the other. There are two approaches to solve this:

First Committer Wins

In this approach, at the commit time of transaction tit_i and for every data item QQ that tit_i updated, the database manager will analyze if there is any version QkQ_k such that:

  • StartTS(tit_i) <\lt W-timestamp(QkQ_k) <\lt CommitTS(tit_i)

If there is, that means that another transaction updated QQ while tit_i was being executed. According to first committer wins, tit_i should not be allowed to commit, and must be rolled backed. In the case there was no such QkQ_k that would verify the above condition, then tit_i would be allowed to commit.

First Updater Wins

This approach considers the usage of locks on data items; whenever a transaction wants to update QQ, it must acquire an exclusive lock on QQ. The algorithm, for a given transaction trying to update QQ is this:

  1. Transaction tit_i tries to acquire lock on QQ
  2. If it is able to get lock:
    1. If QQ has been updated by a concurrent transaction (verify StartTS(tit_i) <\lt W-timestamp(QkQ_k) <\lt CommitTS(tit_i)), tit_i needs t be rolled back
    2. Otherwise, tit_i is allowed to do the update
  3. If it is not able to get the lock, assuming tjt_j holds the lock:
    1. Wait until tjt_j commits or aborts
    2. If tjt_j aborts, tit_i is able to get the lock; go to step 2
    3. If tjt_j aborts, tit_i must be rolled back
note

Locks are released when transactions commit or abort